J CU« IfHemM Vol. <U No ..2. pp, I2?-I39. 1991 
Pnmcd in Grcai Bnum 


0895-4356 9.| S3 00 * 0 00. 
Perpmon Prei* pic 


META-ANALYSIS IN EPIDEMIOLOGY, WITH SPECIAL 
REFERENCE TO STUDIES OF THE ASSOCIATION 
BETWEEN EXPOSURE TO ENVIRONMENTAL TOBACCO 
SMOKE AND LUNG CANCER: A CRITIQUE 

Joseph L. Fleiss 1 and Alan J. Gros$ : 

‘Columbia University. School of Public Health. 600 West 168 Street. New York. NY 10032 and 
■'Medical University of South Carolina. Charleston. SC 29425. USA 

(Received m reused form 29 August 1990) 


Abstract—Meta-analysis. a set of statistical tools for combining and integrating the 
results of independent studies of a given scientific issue, can be useful when the stringent 
conditions under which such integration is valid are met. In this report we point out 
the difficulties in obtaining sound meta-analyses of either controlled clinical tnals or 
epidemiological studies. We demonstrate that hastily or improperly designed meta¬ 
analyses can lead to results that may not be scientifically valid. We note that much care 
is typically taken when meta-analVsis is applied to the results of dinilcal tnals. The Food 
and Drug Administration, for example, requires stnet adherence to the pnncipies we 
discuss in this paper before it allows a drug's sponsor to use a meta-analysis of separate 
clinical studies in support of a New Drug Application. 

Such care does not always carry over to epidemiological studies, as demonstrated by 
the 1986 report of the National Research Council concerning the purported association 
between exposure to environmental tobacco smoke and the nsk of lung cancer. On the 
basis of a meta-analysis of 13 studies. 10 of which were retrospective and the remaining 
3 prospective in nature, the Council concluded that non-smokers w ho are exposed to 
environmental tobacco smoke are at greater risk of acquiring lung cancer than 
non-smokers not so exposed. In our opinion, this conclusion in unwarranted given the 
poor quality of the studies on which it is based. 


L INTRODUCTION 

A working definition of meta-analysis is given 
by Huque [V]: *\.. the term ‘meta-analysis’ 
refers to a statistical analysis which combines or 
integrates the results of several independent 
clinical trials* considered by the analyst to be 
’combinableY* As indicated by this characteriz¬ 
ation of meta-analysis, its key application is to 
be found in the analysis and synthesis of data 
from clinical trials. 

The question then remains, can meta-analytic 
techniques be applied in the analysis of other 
kinds of data such as those that arise in cohort 
and case-control studies found in epidemiol¬ 
ogy? The answer to this question is a guarded 


“yes.” The criteria for reaching this affirmative 
answer are now considered. 

In applications of meta-analysis to clinical 
trials, the following questions, among many 
other that must be addressed: arise. 

• Arc ail studies to be included in the meta- 
analysis, or only the published ones? 

• Are all published studies to be included in 
the meta-analysis, or only the “good” ones? 

• When the studies’ results arc hetero¬ 
geneous, how may they be included in a 
meta-analysis, or should they be meta-ana- 
lyzed at all? 

• Within each study, should all subjects in 
a treatment group be considered in a 
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meta-analysis, or only those subjects who 
were compliant with the treatment? The 
same question applies to subjects in the 
control group. 

Similar issues are of concern in epidemio¬ 
logical studies. However, when case-control 
studies are under consideration, the issue of 
“intention-to-treat,” which is the final question 
just listed with regard to meta-analysis in clini¬ 
cal trials, is not of direct concern. Instead, the 
following additional question needs to be 
addressed in a meta-analysis of case-control 
epidemiological studies. 

• Has proper control or adjustment been 
made for the biases that frequently occur in 
epidemiological studies, such as sociodemo¬ 
graphic or clinical differences between 
study populations, misclassification of sub¬ 
jects with regard to case-control status and 
to levels of exposure, factors other than the 
level of exposure that may affect whether a 
subject is a case or control (i.e. confound¬ 
ing variables), and the publication bias/file 
drawer phenomenon wherein studies that 
fail to show a positive association tend not 
to be published and are thus not candidates 
for inclusion in the meta-analysis? 

Meta-analysis was first applied to the study of 
psychotherapy and to the study of educational 
interventions (see Hedges and Olkin [2], for 
example), and is now widely used to provide 
overviews of randomized controlled clinical 
trials. It is also applied in the synthesis of data 
from epidemiological case-control studies, but, 
as will be covered in Section III, with uncertain 
theoretical justification. 

Among the principal uses of a properly per¬ 
formed meta-analysis are: 

• To increase statistical power for important 
endpoints and subgroups. 

• To resolve controversy when studies dis¬ 
agree. 

• To improve estimates of effect size. 

• To answer new questions that were not 
previously posed in the individual studies. 

How meta-analysis is applied in both ran¬ 
domized clinical trials and epidemiological 
case-control studies are the topics of Sections II 
and III of this paper. Special emphasis will 
be placed on epidemiological studies of the 
hypothesized association between exposure to 
environmental tobacco smoke and king cancer. 


0. APPLICATIONS OF META-ANALYSIS 
TO CLINICAL TRIALS 

Before reviewing and criticizing the appli* 
cation of meta-analysis to epidemiological stud¬ 
ies, it is worthwhile to review and critique its 
application to a methodologically stronger kind 
of study, the randomized controlled clinical 
trial. We shall identify a number of areas of 
uncertainty and controversy concerning such 
applications, present the points on which con¬ 
sensus seems to exist, and then use the results of 
this review as a template for our critique of 
meta-anaiyses of studies in epidemiology. 

Analyze all published studies or only the "good” 
ones? 

In their review of published meta-analyses. 
Sacks et al. [3] found that nearly 30% of them 
combined results from both randomized and 
non-randomized studies. If there is unanimity 
among meta-analytic methodologists on any 
issue, however, it is on the requirement that only 
randomized clinical trials be included in a meta¬ 
analysis [4-8], These experts take it as axiomatic 
that the potential for bias in the non-random- 
ized assignment of patients to treatment groups 
is too great for the results of such studies to be 
trusted. There exist examples of non-random¬ 
ized studies that are, in other respects, superior 
in quality to randomized studies [9], but the 
concern about the quality of non-randomized 
studies in general is a valid one. The principle 
that meta-analyses be restricted to randomized 
studies is by-and-large appropriate. 

Having agreed on the criterion of non- 
randomized treatment assignment for excluding 
a study from a meta-analysis, the experts dis¬ 
agree on other possible criteria for excluding 
studies (absence of double-blinding, efficacy 
rather than intention-to-treat analysis, study is 
out of date, etc.), and indeed disagree on 
whether any randomized trial, no matter how 
poorly designed, should ever be excluded 
[7,8,10]. Hedges, for example, suggests that it is 
standard procedure in meta-analyses in the 
physical sciences to delete experiments that art 
deemed to be flawed [11]. H. J. Eysenck, a 
British psychologist and philosopher of science, 
labels as “mega-silliness" the practice of includ¬ 
ing methodologically inadequate research in a 
meta-analysis [12]. 

Chalmers and his colleagues have developed 
rigorous, reproducible and unbiased methods 
for measuring the quality of a study [13], One 
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may use the derived measurements to decide 
whether to accept or reject the study for a 
meta-anaiysis, to determine in a formal [14] or 
informal [15] way whether a studys quality and 
its estimate of treatment effect are correlated, 
or to weight studies differentially according to 
their measured qualities (such a suggestion was 
recently made by Jenicek [16]), One way to 
carry out this latter strategy is to modify the 
meaning of the weight to be assigned to a 
study's estimated treatment effect, e, in the 
weighted average 

e — Iwejlw. 

In most applications of meta-analysis, w is 
roughly proportional to the total number of 
patients in the study. If q is the study’s value on 
the measure of quality, perhaps scaled: to vary 
from 0 to I, the proposed overall measure of the 
effect of treatment is 

e = I (qw)eiZ(qw). 

A large study whose quality is good will receive 
all 1 or nearly all of the weight it is entitled to, 
whereas a study of comparable size but measur¬ 
ably poorer quality will have its contribution to 
the weighted average correspondingly reduced. 

We are not aware of any meta-analyses in 
which the measures of quality have actually 
been formally incorporated into the analysis. 
The proposed method is novel, and worth a 
critical evaluation. 

Analyze all studies or only the published ones? 

When planning a meta-analysis, the invests 
gator must decide whether to analyze only those 
studies that had been published or to analyze as 
well studies that had not been published. Appar¬ 
ently, no consensus has yet been reached con¬ 
cerning which strategy to adopt. Yusuf [8] 
recommends that the results of both published 
and unpublished controlled trials be considered 
for analysis; he and his colleagues were success¬ 
ful in locating several unpublished studies in 
an influential meta-analysis of the effects of 
beta blockers after myocardial infarction [17]. 
Abstracts of papers presented at meetings and 
abstracts of master’s theses and doctoral disser¬ 
tations were perused in the pioneering meta- 
analyses of studies evaluating the efficacy of 
psychotherapy [18]i Chalmers et al, [4] have 
warned, however, that even a systematic and 
rigorous attempt at obtaining the results of all 
unpublished studies may produce bias or unduly 
decrease precision. 


There are two main reasons for including 
unpublished studies in a meta-analysis. One 
is to overcome "publication bias.** the acknowl^ 
edged tendency of reviewers to recommend 
against and of editors to decide against publish¬ 
ing studies that failed to show an effect of 
treatment. The other is to overcome the bias due 
to the related "file drawer phenomenon,” the 
tendency on the part of the author not even to 
bother submitting for publication an article that 
fails to show an effect [19]. If either of these two 
sources of bias operates, then a meu-analysis 
only of results reported in published articles will 
tend to overstate the degree of statistical signi¬ 
ficance of the treatment’s effect, and to over¬ 
estimate that effect. An alternative to the 
difficult task of searching for unpublished stud¬ 
ies is to attempt to undo this bias by applying 
statistics 1 .* adjustments to the data [20,21]. These 
statistical procedures are still too new for them 
to have been theoretically and empirically 
evaluated. Thus, effectively, the publication 
bias/file drawer issue remains a serious prob¬ 
lem in performing a meta-analysis. The care¬ 
ful and thorough search for unpublished 
studies is an expensive and time consuming 
endeavor, and may uncover studies of uncertain 
quality, but no validated alternative is currently 
available. 

Analyze only homogeneous studies? 

Some statistical reviewers at the U.S. Food: 
and Drug Administration have strongly criti¬ 
cized the pooling of results from controlled 
clinical trials in which there is heterogeneity of 
treatment effect — i.e. sizable differences exist 
between studies in their estimates of the effect of 
treatment—and have suggested that it is valid to 
combine results only from studies in which the 
estimates are sufficiently close one to another 
[22,23]. Stein, in fact, denigrated as a mere 
"computational exercise" the meta-analysis of 
studies in which the estimated treatment effects 
were heterogeneous [23]. Sacks et al refer to this 
criterion as combinability [ 3]i 

Fairly straightforward statistical methods 
exist to test the hypothesis of heterogeneity for 
both continuous measurements [24] and categ¬ 
orical data [25] (see the statistical Appendix); 
although not all FDA reviewers are in agree¬ 
ment as to how strict the statistical criteria 
should be for deciding that different studies are 
combinablc [1]. Furthermore, it is not clear that 
these reviewers would accept as evidence for 
efficacy the finding of a statistically significant 
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pooled effect even if the meta-analysis was 
restricted to studies that were combinable. 

Some justification may be granted to the 
tough stand taken by these FDA reviewers, as 
they are responsible for interpreting and apply¬ 
ing regulations handed down to them. In set¬ 
tings other than regulatory ones, however, it is 
not obvious that the criterion of combinability 
must always be satisfied before a meta-analysis 
may be applied. DeMets, for example, questions 
the meaning that attaches to the overall results 
of a meta-analysis when there is heterogeneity 
across studies [26]. Others, however, suggest 
that it is precisely when studies differ with 
respect to the magnitude and perhaps even the 
direction of treatment effect that the formal 
methods of meta-analysis are needed to sum¬ 
marize in an unbiased manner all of the infor¬ 
mation available to date [6,27]. With respect to 
the possibility that the effect of a treatment is 
strongly positive in one study and strongly 
negative in another, Peto states that “(this) 
situation... would be unusual, although cer¬ 
tainly not impossible” [6, p. 233]. The frequency 
with which such a qualitative interaction occurs 
may be greater than he and others (including the 
two authors of this paper) have believed. A 
recent randomized controlled trial of the post¬ 
infarction effect of a calcium channel blocker, 
for example, found this very kind of interaction 
[28]. 

Whether and how to carry out a meta-analy¬ 
sis in the presence of heterogeneous effects are 
still unanswered questions. There appears to be 
only one point on which there is agreement: it 
is invalid to delete from the set of studies to be 
meta-analyzed those whose results are in the 
“wrong direction,” for the opportunity for bias 
in identifying the “deviant” studies is too great. 
Furthermore, one would be left drawing the 
inane conclusion that “in those studies in which 
the treatment effects were in the same direction 
(all positive, all negative, or all close to zero), 
the overall effect of treatment was also in that 
direction/* 

Intention-to-treat vs efficacy analysis 

The controversy that exists concerning the 
appropriate samples of patients to be analyzed 
within a single trial carries over into the realm 
of meta-analysis. According to the intention-to- 
treat principle, patients are to be analyzed 
within the treatment groups they were randomly 
assigned to, no matter how much or how little 
treatment they actually received [29]. 


In an efficacy analysis, on the other hand, 
only data from compliant patients are analyzed 
[30]. Sack et al. found; in their review, that 
9 of 19 meta-analyses that considered this 
issue restricted their analyses to studies that 
employed the intention-to-treat principle, and 9 
analyzed data from either kind of study [3]. 
Only one meta-analysis restricted attention to 
studies in which efficacy analyses were per¬ 
formed. Because efficacy analyses tend to pro¬ 
duce overestimates of a treatment's effect, and 
intention-to-treat analyses tend to produce 
underestimates, caution suggests that; when 
sufficient information is provided to ascertain 
which approach was used, only studies that 
analyzed data from the more conservative 
intention-to-treat perspective be included in a 
meta-analysis. When studies that performed 
only efficacy analyses are included in a meta¬ 
analysis, one may expect that some degree of 
bias toward greater significance and toward an 
overestimation of the effect of treatment is 
present. 

Data presentation 

Most theorists and many practitioners of 
meta-analysis agree that a graphic display of the 
individual studies* results and of the overall, 
pooled result is invaluable. Most often, each 
study’s estimated treatment effect (expressed as 
a relative risk when the outcome is morbidity or 
mortality) is marked by a circle or tick mark, 
and 95 or 99% confidence limits about the 
estimate are displayed as straight lines extending 
to the left and to the right of the point estimate 
(see Fig. 1). The several studies' lines appear one 
above the other, and the last line indicates the 
value of the summary estimate pooled across all 
individual studies, along with its confidence 
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Fig. !. Odds ratios and 95% confidence intervals for sevtn 
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limits. Important examples from overviews 
of clinical trials of the effectiveness of beta- 
blockers in reducing the risk of mortality 
after myocardial infarction have been pro¬ 
vided by Baber and Lewis [31] and by Yusuf 
et al. {17]. Studies may be separated by the 
time during which they were carried out (early 
or late during the development of a treat¬ 
ment [31]), or by other characteristics of the 
trials. 

Peto has proposed an alternate method 
for the graphical presentation of the individ¬ 
ual studies’ results [32], one that is less 
intuitively understandable but that lends itself 
more directly to sophisticated statistical analy¬ 
sis. For each study, the values of two statistics 
are calculated. One is the difference, O-E, 
between the observed number of treated 
patients who experienced the outcome event, 
O , and the number expected to have done 
so under the hypothesis that the treatment is 
no different from the control, E. The second 
statistic, V, is the variance of the difference 
O-E. 

With the ordinate of a pair of ordinary rec¬ 
tangular axes represen ting O-E and the abscissa 
representing K, each study is represented by a 
point. If a straight line passing through or near 
the origin provides an adequate fit to these 
points, the meta-analyst may conclude that the 
studies' odds ratios are approximately equal, 
with their average value estimable, roughly, as 
the antilogarithm of the slope of that line. A 
more precise estimator of the summary odds 
ratio is 

„ fI(0-E)\ 

OR-exp ^ fy- p 

the antilogarithm of the ratio of the sum 
of all the O-E values to the sum of all the 
V values. (When OR is outside of the inter¬ 
val from 0.2 to 5.0, this formula may be 
inaccurate [33], and should be replaced by the 
more accurate formulas due to Mantel and 
Haenszel [34] or by those presented in the 
Appendix). The statistical significance of the 
estimated odds ratio may be tested by referring 
the quantity. 

, I (O-E) 

JTv 

to the standard normal distribution and declar¬ 
ing the summary odds ratio to be statistically 
significant if Z is sufficiently large. The lower 
and upper limits of an approximate 95% confi¬ 


dence interval for the population odds ratio, 
finally, are given by 

I(0-£)± 1.96 n / / IK\ 

IK f 

The reader should note that the confidence 
interval so obtained is not symmetric about the 
point estimate. 

Studies fixed vs studies random 

A debate that is far from being resolVed 
concerns the issue of how, if at all, interstudy 
differences in the magnitudes of the estimated 
treatment effects are to be taken into account in 
the meta-analysis. With only a limited number 
of exceptions [35], virtually all meta-analyses 
have ignored differences in estimated effects 
between studies (except to describe them quali¬ 
tatively), and have used in the analysis only 
within-study measures of precision. Thus, as an 
example, if in one meta-analysis there are two 
published studies with ORs of 1.0 and 6.0, if in 
another there are two published studies with 
ORs of 2.0 and 3.0, and if all four values of V 
(the variance of the logarithm of the OR) are 
equal to 0.01, then in both studies the value 
of the pooled OR will be 2.45 and in both 
studies the approximate 95% confidence inter¬ 
vals extend from 2.13 to 2.81. No cognizance is 
taken of the obvious fact that the two studies in 
the first meta-analysis are much further apart 
than the two studies in the second. If each study 
were so large that the sampling variances were 
all equal to zero, both confidence intervals 
would degenerate to the single value of 2.45. 
The conclusion in both cases would be that the 
odds ratio was known with certainty to equal 
2.45, although this cannot possibly be correct 
given that the individual ORs do not equal one 
another. 

Peto, on the one hand, nevertheless asserts 
that this is precisely as things should be (36). 
whereas Meier, on the other, argues that inter¬ 
study variation is a key feature of the data and 
should contribute to the analysis [37]. In the 
jargon of the analysis of variance, Pcto’s per¬ 
spective is that the studies represent levels of a 
fixed factor whereas Meier’s is that they rep¬ 
resent levels of a random factor. DeMets [26] 
and Bailey [38] discuss the pros and cons of 
these two competing statistical models, with 
Bailey presenting those circumstances, assump¬ 
tions and research questions under which one or 
the other perspective is the more appropriate. 
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Bailey [38] suggests that, when the research 
question concerns whether the treatment will 
have an effect, on the average, or whether 
exposure to a hypothesized risk factor will cause 
disease, on the average, then the model of 
studies being random is the appropriate one. 
When the question concerns whether treatment 
has produced an effect, on the average, or 
whether exposure has caused disease, on the 
average, in the studies at hand , then the model 
of studies being fixed is the appropriate one. The 
former question implicitly assumes that there 
is a population of studies from which those 
included in the meta-analysis were sampled. It 
anticipates the possibility of future studies being 
conducted, or even previously unknown studies 
being uncovered. The latter question assumes 
that only the studies included in the meta-analy- 
sis are of interest; and that there is no interest 
in generalizing the result to other studies. The 
former question, in our opinion, is usually the 
more important of the two. 

In the first of the two hypothetical meta¬ 
analyses, the random effects model yields an 
approximate 95% confidence interval extending 
from below 0.5 to above 10.0. In the second, it 
yields an approximate 95% confidence interval 
extending from 1.65 to 3.64. These intervals 
were constructed using the method of Der 
Simonian and Laird [39] (see the Appendix). In 
our opinion, the difference between the two 
intervals based on the random effects model 
accurately reflects the difference that exists 
between the two pain of studies, whereas the 
equality of the more traditional intervals based 
on the fixed effects model does not. 

The potential for fragility in meta-analysis 

It is possible for a single study to exert a 
powerful influence on the results of a meta¬ 
analysis. A striking example is provided by the 
meta-analysis of randomized trials of the effec¬ 
tiveness of aspirin in preventing death after a 
myocardial infarction [38], Summary results are 


presented in Table 1 and in Fig. I for seven trials 
carried out between 1976 and 1988. The first five 
[40—44] constituted a homogeneous set (the 
value of the chi-square statistic for homogeneity 
of ORs was 0.62 with 4 df, far from statistical 
significance), for which the value of the sum¬ 
mary OR for aspirin vs control was 0.76 (stat¬ 
istically significant at p<0.0l). with a 95V# 
confidence interval extending from 0.65 to 0 90. 

The next trial, the Aspirin Myocardial Infarc¬ 
tion Study (AMIS) [45]; changed the picture 
radically. Its OR of 1.13, while not significantly 
different from unity, was significantly different 
from the earlier pooled OR of 0.76 (x J « 9.31, 
df * 1 , p < 0.01). The value of the summary 
OR across the first six studies was a statistically 
non-significant 0.90 (p > 0.10). with a 95% 
confidence interval extending from 0.80 to 1.02. 

The confidence intervals for the first five 
studies and for the first six demonstrate the 
paradox discussed earlier that as one's uncer¬ 
tainty as to the value of the overall OR in¬ 
creases, the length of the confidence interval 
based on the fixed effects model decreases. The 
estimates and confidence intervals provided by 
the DerSimonian-Laird random effects model 
[39] for the six studies are more valid given the 
degree of heterogeneity that exists across them. 
The estimated odds ratio has a borderline 
significant value of 0.84 (y J - 3.05, df »L 
p <0.10), and the associated 95% confidence 
interval extends from 0.70 to 1.02. The length 
of the DerSimonian-Laird interval is, in logar¬ 
ithmic units. In (upper limit/lower limit) ■ In 
(1.02/0.70)» 0.38, appropriately greater than 
both the length of the fixed effects interval for 
the first five studies (ln(0.90/0.65> - 0.32) and 
that for the first six (ln( 1.02/0.80)» 0.24). (The 
fixed effects and random effects analyses of the 
first five studies yield identical results). 

There were no obvious reasons for removing 
AMIS from the meta-analysis, other than the 
invalid one that its results differed significantly 
from those of the first five studies. The state of 


fable l. Result* of seven placebo-controlled randomized studies of the effect of 
aspirin in preventing death after myocardinl infarction 


Study 

No. deatiu/No. patient* 

Aspirin Placebo 

OR 

y - tn(OR) 

v - l/Varty) 

MRC-1[40] 

49/615 

67/624 

0.720 

— 0.329 

25.710 

CDP [41} 

44/75* 

64/771 

0.681 

-0.384 

24.291 

MRC*2[42) 

102/832 

126/850 

0.803 

— 0.219 

48.801 

GASP (43] 

32/317 

38/309 

0.801 

— 0.222 

15440 

PAR1S[44J 

85/810 

52/406 

0.798 

—0.226 

28.409 

AMIS [45] 

246/2267 

219/2257 

1.133 

0.125 

103.985 

ISIS-2 [46] 

1570/8387 

1720/8600 

0.895 

- 0.111 

663.923 
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knowledge was therefore ambiguous: the first 
five studies collectively pointed to a statistically 
significant and fairly strong effect of aspirin 
for secondary prevention after a myocardial 
infarction, but the first six provided, at best, 
only suggestive evidence for the effectiveness of 
aspirin. This ambiguity was not resolved until 
the results of a seventh study, the Second Inter¬ 
national Study of Infarct Survival (ISIS-2) [46], 
were published. For the sake of comparability 
across all studies, the analyses below are based 
on the 2-year all-cause mortality rates from 
ISIS-2. 

According to the DerSimonian-Laird random 
effects analysis, the overall OR across ail seven 
studies was equal to a barely statistically signifi¬ 
cant 0.88 (x : = 4.36, df - 1, p < 0.05), with a 
95% confidence interval extending from 0.77 to 
0.99. The fixed effects analysis of these seven 
studies suggested (inappropriately, we believe) 
sharper conclusions: a highly significant point 
estimate for the OR of 0.90 (x 2 * 10.6+ df « I, 
p < 0.005), with a narrow 95% confidence inter¬ 
val extending from 0.84 to 0.96. We question the 
validity of the latter analysis because significant 
between-study variation remains (the OR for 
AMIS is not only significantly different from the 
average OR for the first five studies, as found 
before, it is significantly different from the OR 
for ISIS-2 (x : 38 4.95* df - 1, p < 0.05)). 

The overall substantive conclusion from this 
meta-analysis is that aspirin seems to be a 
modestly effective agent for reducing the risk 
of death during a period of approximately 
2 years after a myocardial infarction, with a 
percentage reduction in the odds for dying 
relative to placebo equal to approximately 10%. 
The limits of uncertainty about this value 
arc unsure, with the conservative random 
effects approach yielding a much wider confi¬ 
dence interval than the anticonscrvauvc fixed 
effects approach (in both instances, though, 
the upper confidence bound was less than 
1 . 0 ). 

The major methodological conclusion is 
that a single study may exert a powerful effect 
on one's conclusions. Here, there were two 
such influential studies. AMIS undid the statisti¬ 
cally significant effect of aspirin found in the 
first five studies* and ISIS-2 undid the statisti¬ 
cally non-significant effect of aspirin found in 
the first six. Given that one cannot know 
whether or when the next and potentially deci¬ 
sive study will be conducted, it would be pru¬ 
dent always to attach greater uncertainty than 


provided by traditional confidence intervals to 
the results of a meta-analysts of the studies 
conducted to date. 

Execution and reporting 

Several authors have proposed: guidelines for 
carrying out and publishing the results of meta- 
analyses [3,16,47]. Although these guidelines arc 
presented specifically for the meta-analysis of 
randomized clinical trials, they apply, with only 
minor exceptions, to meta-analyses in epidemi¬ 
ology as well. In our opinion, the standards and 
criteria offered by Sacks et a/.[3], including, as 
they do, most of the others’ guidelines, are the 
most useful. The six areas within which a meta- 
analysis should be evaluated, with those of their 
subdivisions that pertain to both clinical trials 
and epidemiology, follow: 

(A) Study design. Just as the individual studies 
being meta-analyzed should be rigorously de¬ 
signed, with the design carefully and completely 
described, so should the meta-analysis itself. 
The meta-analysis should be earned out in 
accordance with a protocol prepared before the 
initiation of the study. The report of its results 
should describe the methods used by the meta¬ 
analysts to find all relevant articles, abstracts, 
chapters, etc.; should list the studies analyzed 
and enumerate those that were excluded (with 
reasons for their exclusion); and should provide 
summary data on the clinical and demographic 
characteristics of the subjects in the studies 
(subtypes of lung cancer, for example in 
case-control studies). 

(B) Combinability . The authors should 
address the statistical issue of whether the re¬ 
sults from the separate studies should have been 
combined. If the estimates of treatment effect in 
clinical trials or of exposure-illness association 
in epidemiological studies differed significantly 
one from another, and especially if there was 
evidence of “qualitative interaction" (the esti¬ 
mates being in one direction in some studies and 
in the other direction in others), the authors 
should discuss why they proceeded to pool the 
results from all the studies. 

(C) Control and measurement of potential bias. 
Several sources of unconscious bias exist+ each 
of which should be addressed in the protocol for 
the meta-analysis. The more important ones 
would be discussed in the publication reporting 
on the results of the meta-analysis. Bias may 
occur in the decision as to which studies to select 
and which to exclude. Ideally, the decision 
should be made by one or more reviewers who 
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concentrate on the study’s methods and are kept 
blinded to the study’s results. 

Bias may exist in the process of extracting 
the summary estimates of effect or association 
from the publication of the study’s results. This 
is especially likely to occur in epidemiological 
studies, in which many actual or potential 
confounding variables are controlled—separ¬ 
ately or in combination—and in which many 
estimates of relative risk are produced. Depend¬ 
ing on their predilections, reviewers might 
tend to choose the largest estimate, the smallest 
estimate, or, in order to be fair, the estimate 
closest to the average of the individual ones. 
A solution may be to have two or more review¬ 
ers carry out the data-extraction indepen¬ 
dently, and to resolve any disagreement by 
having them discuss the study and reach 
consensus. 

Sacks and his colleagues recommend, finally; 
that the sources of support for the meta-analysis 
be identified. 

(D) Statistical analysis. Depending on the 
nature of the response variable, quantitative or 
categorical, either an analysis of variance [24] 
or a variation of the Mantel-Haenszel pro¬ 
cedure [34], both of which properly average 
within-study differences, should be employed. 
Point and interval estimation are desirable 
in addition to significance tests. If the meta¬ 
analysis fails to demonstrate a significant over¬ 
all effect or association, the possibility of 
inadequate power should be considered. When 
specific effects or associations within sub¬ 
groups were hypothesized a priori, separate 
meta-analyses should be performed within those 
subgroups. 

(E) Sensitivity analysis. When possible, the 
studies’ results should be analyzed in two or 
more ways in order to confirm that the final 
result from the meta-analysis is qualitatively the 
same no matter how the results are analyzed. 
The quality of the individual studies should be 
determined and incorporated into the final con¬ 
clusions from the meta-analysis. The possible 
impact of publication bias and of the “file 
drawer problem” should be carefully con¬ 
sidered. 

(F) Application of results. Bringing to bear all 
of the above considerations, the meta-analysts 
should come to a decision as to whether the 
pooled results provide a definitive, effectively 
final answer to the research question, or whether 
the conclusions are tentative and further indi¬ 
vidual studies are needed. 


LLL APPLICATIONS OF META-ANALYSIS 
TO EPIDEMIOLOGICAL STUDIES OF 
EXPOSURE TO ETS AND LUNG CANCER 

An important application of meta-analysis 
was the analysis by the National Research 
Council (NRC) [48] of all known epidemiologi¬ 
cal studies (through 1986) of the hypothesized 
association between a non-smoker's exposure at 
home to environmental tobacco smoke (ETS) 
and the risk of lung cancer. The overall OR 
found by the NRC was a statistically significant 
1.34 (p < 0.001). with a 95% confidence interval 
extending from 1.18 to 1.53. Among the other 
criticisms of the NRCs meta-analysis that are 
to be addressed subsequently is the criticism 
that many biases in the individual studies that 
should have been accounted for were not 

Four studies were excluded from the NRC’s 
meta-analysis, for apparently valid reasons: no 
reference population was given, no raw data 
were presented, etc. Aside from their specifica¬ 
tion of the reasons for the exclusion of these 
four studies, the authors of the NRC report 
appear not to have followed the major guide¬ 
lines proposed by Sacks et al. [3]. For example, 
they did not provide a formal protocol for 
the meta-analysis, nor, apparently, did they 
give any consideration to the possibility of 
heterogeneous ORs across the several studies. 

In addition, most of the decision points and 
sources of bias discussed in Section II in connec¬ 
tion with the meta-analysis of clinical trials also 
apply to the meta-analysis of epidemiological 
studies. For example, studies that fail to show 
an effect of treatment are often not published 
either due to a “publication bias.” i.e. articles 
that fail to show an effect of treatment are often 
rejected for publication, or due to the “file 
drawer phenomenon,” i.e. there is a tendency on 
the authors’ part not to submit for publication 
an article that fails to show an effect. These two 
related sources of bias are clearly present in 
epidemiological studies as well as in clinical 
trials. For example, it may be that a study 
comparing the incidence of lung cancer among 
non-smokers exposed to ETS against the inci¬ 
dence of lung cancer among non-smokers not so 
exposed yielded a relative risk substantially less 
than one. The world of science, as it is today, 
might well preclude the publication of such a 
study [49]. 

Furthermore, the question comes to mind 
whether the existing epidemiological studies of 
a possible association between exposure to ETS 
and the incidence of lung cancer in non-smokers 
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are of adequate quality. Indeed, there is the 
question whether any of these studies meets 
even minimal standards of quality {50, 51]. 

Letzel et ai [51] considered the effects of 
misclassification errors on the results of the 
NRC's meta-analysis. They assumed three sets 
of conservative rates of misclassification for 
both disease and exposure to ETS, and, after 
adjusting for misclassification, concluded that 
“taking all 1 this evidence together our calcu¬ 
lations show that the findings of all 1 studies 
about female lung cancer from passive smoking 
are consistent with the null hypothesis." Their 
final statement also worth quoting, reads: "This 
brings us to final conclusion that there are 
presently only 2 alternatives—accepting the null 
hypothesis or creating new empirical evidence 
by peforming a really good study." 

We shalli nevertheless meta-analyze the nine 
American epidemiological studies that have, to 
our knowledge, been performed, the five that 
were included in the NRC report [48] plus four 
more recent ones. It is important to remind 
ourselves beforehand that there are many biases 
and confounders that will tend to inflate the 
relative risk. An especially important one is the 
misclassification of actual smokers as non- 
smokers. As Let [52, 53]i and Letzel et aL [5l]i 
point out, a woman who claims to be a non- 
smoker is more likely to be or to have been an 
actual smoker if married to a smoker than if 
married to a non-smoker. Other sources of bias 
include the misclassification of disease (iie. mis- 
classifying a non-lung cancer patient as having 
lung cancer as the primary disease, and misclas- 
sifying a lung cancer patient as not having 
cancer [51])*. differing lifestyles between' house¬ 
holds where tobacco is used and those where 
tobacco is not used, and differential exposure 
and duration of exposure to air pollution be¬ 
tween the exposed and unexposed groups. These 
biases have been considered to varying degrees 
by researchers in the field, with generally little 
success in controlling them. 

One major source of bias that has not been 
analyzed sufficiently thoroughly, and has not yet 
been adequately controlled, is the misclassifi¬ 
cation of the spouse’s smoking history (in this 
discussion the spouse is the husband or wife of 
the patient or of the control). It is possible that 
a non-smoking woman with lung cancer will 
overestimate the amount or duration of her 
husband's smoking in an attempt to find a 
causal explanation for her disease. The same 
tendency might be expected to exist when it is a 


surrogate for the patient—a child or sibling, or 
the spouse himself—who is being asked to re¬ 
port on the spouse's smoking history. The latter 
point is important because the proportion of 
patients reported on by a surrogate exceeds 
50% in some studies. 

US. studies of ETS and lung cancer 

There are many reasons for restricting atten¬ 
tion to American studies of whether there is an 
elevated risk of lung cancer to non-smokers 
exposed to ETS relative to non-smokers not so 
exposed. One is that this is the population to 
whom policy decisions will apply and on'whom 
those decisions should be based. Another is that 
the summary ORs in the individual studies are 
derived from distributions of smoking amounts 
and durations, and: of brands of cigarettes and 
other tobacco products, that pertain to popu¬ 
lations within the U.S.. and may thus be 
expected to be relatively homogeneous. Odds 
ratios from studies in other countries, on the 
other hand, are derived from distributions that 
may differ markedly from those in the U.S.. and 
thus the ORs themselves may not be relevant to 
the American experience. Genetic and lifestyle 
differences between the U.S. population and the 
populations studied elsewhere (mainly in east 
Asia) also argue for a meta-analysis only of U.S. 
studies. 

The first U.S. study, by Garfinkel [54]. was a 
prospective follow-up study of more than 
175,000 women who reported themselves to be 
non-smokers. All types of cancer of the lung 
were taken as end points. A "non-smoker" in 
this study was not only a women who reported 
that she never smoked, but also one who re¬ 
ported that she smoked only occasionally but 
not regularly. Little if any attempt seems to have 
been made to verify these women's self-reports. 

The remaining U:S. studies were all case- 
control studies comparing patients with lung 
cancer against one or another kind of compari¬ 
son group. The first was the study in New 
Orleans by Correa et ai [55]. Controls were 
patients with other diseases, from the same 
hospitals as the lung cancer cases, who were 
matched to the cases on age. sex and race. 
Specially trained interviewers were relied on to 
obtain exposure data for the cases and controls, 
although it is not clear whether the interviewers 
were blinded to whether a patient was a case or 
a control. The next of kin served as a proxy for 
the patient in 24% of the cases and: |1 % of the 
controls. 
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In the study by Kabat and Wynder [56], lung 
cancer patient* in six cities were identified, and 
controls were matched to the cases on age, sex, 
race, date of interview and hospital. More care 
seems to have been taken in this study than in 
the others to ensure that subjects classified as 
non-smokers were truly such. Interviewers used 
a standardized questionnaire, but it is unclear 
whether they were blinded to the sutus of the 
patient as a case or a control. 

The study of Buffier et al. [57] was conducted 
in six coasul counties in Texas. Little infor¬ 
mation is provided about such key features of 
the study as the criteria for classifying the 
spouse as a "regular smoker'* or not, whether 
ex-smokers were included or excluded, whether 
the interviewers were blinded, and the number 
of patients for whom a surrogate interviewer 
was required. 

Garfinkel et al. [58] studied cases and controls 
from hospitals in New Jersey and Ohio. Con¬ 
trols were patients with colorecul cancer, 
matched to the cases on age and hospiul. The 
interviewers were kept blinded to the sutus of 
each patient. Women were counted as unex¬ 
posed to ETS even if their husbands smoked 
cigarettes "only occasionally.” There was exten¬ 
sive reliance on surrogate interviewees, many 
with questionable knowledge about the patient: 
approximately one quarter of all interviews were 
with someone other than the patient, the spouse 
or a child, about 60% were with the spoil sc or 
a child, and only 12% were with the patient 
herself. 

The case-control study by Wu et al. [59], 
conducted in Los Angeles County, was the 
first to use neighborhood rather than hospital¬ 
ized controls. Only cases who were still alive 
were interviewed (all on the telephone); i.e. no 
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Fig. 2. Odds ratio* and 95% confidence intervals for nine 
U.S. epidemiological studies of the hypothesized association 
between exposure to environmental tobacco tmokc and lung 
cancer. 


Table 2. Summary results of U.S. epidemiologic studies of 
the association between a nonsmoking womans exposure 
to environmental tobacco smoke and the nsk of lung cancer 


Study 

OR 

y - ln(OR) 

- 1/VarO‘l 

Garfinkel [54J 

1.17 

0.157 

73.730 

Correa et al. [55] 

2.02 

0.703 

4.745 

Kabat and Wynder [S6]i 0.79 

-0.233 

3.061 

Buffier et ai . [57] 

0.80 

-0.226 

4.777 

Garfinkel et al [58] 

1.12 

0.1113 

22:330 

Wu et a!. [59] 

1.2 2 

0 199 

7:545 

Brownson et al. [60] 

1.6* 

0.519 

2:310* 

Humble et al . [61] 

1.78 

0.577 

3,826 

Varda [62] 

091 

-0 090 

36,268 


•From personal communication from Dr Brownson. 

surrogates were permitted for cases who had 
died or who refused to be interviewed. No 
information was provided as to whether the 
interviewers were blinded. The point estimate of 
the OR given by the authors in their paper's 
abstract and in the text on p. 748, OR = 1.2, is 
inconsistent with the confidence interval re¬ 
ported in both of those places, 0.5-3.3 (the 
geometric mean of the limits must equal the 
point estimate). Instead of working with these 
incorrect values, we used in Table 2 and in Fig. 
2 of this paper the values for "spouse smoked” 
for adenocarcinoma in their Table 2: OR * 1.2, 
with a 95% confidence interval extending from 
0.6 to 2.5. 

Brownson et al. [60] carried out their 
case-<ontrol study in Denver. The control were 
patients with cancer of the colon or bone mar¬ 
row and were matched according to age and sex 
(there was approximately a 50:50 split on sex 
for the patients with lung cancer). The inter¬ 
viewer was blinded to the case or control status 
of the patient. The interviewee was someone 
other than the patient (mainly the spouse but 
occasionally a sibling or child) for nearly 70% 
of the cases and almost 40% of the controls. 
Exposure to ETS was not dichotomized in their 
Table 4 as "none” vs "any” but as "less than 
four hours per day” vs "four or more”. The 
95% confidence interval for the OR there 
should extend from 0.46 to 6.10 (personal com¬ 
munication from the senior author); 

The study by Humble et al. [61] was a popu¬ 
lation-based case-control study in New Mexico. 
Controls were obtained by random digit dialing 
or from a randomly generated list of Medicare 
recipients. They were selected to match the 
frequency distributions of the cases on sex, 
ethnicity and age. The patient’s sutus as a 
never-sraoker was checked against the infor¬ 
mation recorded in the hospiul chart. More 
than half of the time a surrogate was relied on 
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for exposure information about the case. It 
is unclear whether the interviewers were kept 
ignorant of the subject's status. 

Varela’s study [62]l was earned out in New 
York State, with the controls being selected 
from State motor vehicle records. They were 
matched to the cases on age, sex, county of 
residence and previous smoking history. The 
questionnaire administered to the cases was 
slightly different from the one administered to 
the controls, so blinding was obviously imposs¬ 
ible. Surrogate interviewees were permitted for 
the one-third of cases who had died or could 
not be interviewed for other reasons, but in 
such instances a surrogate respondent was 
interviewed for the matched control. 

A final publication pertaining to experience in 
the U.S. is by Dalager et al [63]. Rather than 
being a report on a new study, this paper reports 
the results of a summarization of data reported 
on earlier by Correa et al [55] and by 'Buffer et 
a!. [57], plus data that apparently have not yet 
been published anywhere. The results of the 
analysis may not be totally valid because “the 
data from all three study areas were merged” 
instead of being combined using the methods 
described in the Appendix. In any event, to have 
included the results of the quasi-meta-analysis 
by Dalager et al. [63] in our meta-analysis would 
have resulted in the studies by Correa et al [55] 
and Buffler et al. [57] inappropriately being 
counted twice. 

The overall quality of the American studies is 
obviously quite variable, just as it is for all such 
studies world-wide. Because we did not develop 
a priori a set of procedures for the unbiased 
measurement of a study's quality, it is appropri¬ 
ate that we include all known U.S. studies in our 
meta-analysis [54—62]. Most of these studies 
reported several values for the odds ratio. We 
selected for analysis one value per study, the 
value we believe the authors took to be their 
most accurate measure of association between 
exposure to ETS and lung cancer. These usually 
agreed with the values selected by the NRC [48] 
and by Layard [50] in their meta-analyses. 

The results are presented in Table 2 and 
Fig. 2. There is no evidence for study-to-study 
heterogeneity (the value of the chi-square stat¬ 
istic with 8 df is a non-significant 5.46). The OR 
of 1.17 for the single prospective study, that by 
Garfinkel for the American Cancer Society [54], 
is close in value to the average OR of 1.07 for 
the eight case-control studies [55-62]. The over¬ 
all OR across alt nine studies is equal to a 


statistically non-sigmficant 1.12 (/ : =1.88: 
df = 1), with the 95% confidence interval ex¬ 
tending from 0.95 to 1.30. The fact that no 
significant association was found neither vindi¬ 
cates nor condemns the meta-analysts of these 
epidemiological studies. Given the biases tha r 
exist in each individual study, the safest con¬ 
clusion from the present meta-analysis is a 
negative one: there is no convincing scientific 
evidence from the epidemiological literature of 
an association between exposure to ETS and the 
risk of lung cancer in the U.S. 

IV. CONCLUSIONS 

Meta-analyses, when properly pcrformedl 
can be used effectively in both clinical trials 
and epidemiological studies for the following 
purposes; 

• To increase the power of statistical tests for 
important endpoints and subgroups. 

• To make sense out of studies with 
conflicting conclusions. 

• To improve estimates of effect size. 

However, uncritical use of meta-analysis can 
and does lead to unsubstantiated conclusions. 
Only when all the issues that we have discussed 
are considered and properly accounted for is it 
possible to apply meta-analysis to combine 
studies so that the overall result is scientifically 
valid. These issues include publication bias, 
the question of heterogeneity across studies, 
whether all subjects should be included in the 
meta-analysis or only those who are compliant 
with their treatment (this pertains only to clini¬ 
cal trials), whether proper control or adjust¬ 
ments have been made in epidemiological 
studies for sociodemographic or clinical differ¬ 
ences between study populations, and the poss¬ 
ible misclassification of subjects with regard to 
levels of exposure and case-control status. 

It is very unlikely that the biases present in 
the epidemiological studies of the possible 
association between exposure to ETS and the 
risk of lung cancer can ever be removed. The 
meta-analysis performed by the NRC [48] must 
either be completely discounted or, as Stein [23] 
concluded so succinctly in another context, 
considered a mere “computational exercise.” 
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APPENDIX 


study. In a randomized controlled! tnal the \wo groups 
would be the treated and placebo samples and the character¬ 
istic under study might relapse or some other kind of 
failure. In an epidemiological case-control study the two 
groups would be the cases and controls and the character, 
istic under study would be exposure to the hypothesized nsk 
factor. A review of the fixed effects analysts of the data 
follows. 

The loganthm of the OR in study s. denoted by \\. 
equal to 

where In denotes natural loganthm. The standard error of 
y % is given by 


— Pi») -Pi:V 

and the factor by which >, is weighted in the classical I fixed 
effects analysis. is 

h, - f (se,y. 

The overall OR across all S studies is equal to 
OR.» exptf). 

where y - lw t y t and the limits of a 95% confidence 
interval for the overall OR are given by 

exp O' ± l.96- v I*,}. 

This interval will not be symmetnc about OR 
The “combinability" of the S studies, ue. the hypothesis 
that the S underlying ORs art equal, may be tested by 
refemng the sutmic 

Q “ 

to percentage points of the chi-square distribution with 
5 - l df . This same statistic Q plays a central role in the 
Der Si monian-Laird random effects analysis of the data (39) 
In particular, the DcrSimoman-Laird analysis is identical to 
the fixed effects analysis just presented if Q ^ S - I. but the 
two methods diverge if Q > S - 1 
Assume, therefore, the Q > S - 1. and define 
<Q -($- 1))Ih- (I 

The DerSimoman-Laird weighting factor fori study s is 
equal to 

and the random effects point and interval estimates of the 
overall OR become 


Statistical Appendix: Analysts of Log Odds Ratios 

Suppose that there are. all told. 5 studies to be meu-ana* 
lyzed. In a typical one. say study j. let #i„ and n^ be the 
sample sizes in the two groups being compared and let p, x 
and p a be the proportions having the characteristic under 


expo**) 


and 

where y* 


expO 5 * ± 196/ v Th T). 
lw*y lt lw*. 


Source: https://www.industrydocuments.ucsf.edu/docs/lhyx0000 
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